Learning Distributed Linguistic Classes
نویسنده
چکیده
Error-correcting output codes (ECOC) have emerged in machine learning as a successful implementation of the idea of distributed classes. Monadic class symbols are replaced by bit strings, which are learned by an ensemble of binary-valued classifiers (dichotomizers). In this study, the idea of ECOC is applied to memory-based language learning with local (knearest neighbor) classifiers. Regression analysis of the experimental results reveals that, in order for ECOC to be successful for language learning, the use of the Modified Value Difference Metric (MVDM) is an important factor, which is explained in terms of population density of the class hyperspace. 1 I n t r o d u c t i o n Supervised learning methods applied to natural language classification tasks commonly operate on high-level symbolic representations, with linguistic classes that are usually monadic, without internal structure (Daelemans et al., 1996; Cardie et al., 1999; Roth, 1998). This contrasts with the distributed class encoding commonly found in neural networks (Schmid, 1994). Error-correcting output codes (ECOC) have been introduced to machine learning as a principled and successful approach to distributed class encoding (Dietterich and Bakiri, 1995; Ricci and Aha, 1997; Berger, 1999). With ECOC, monadic classes are replaced by codewords, i.e. binary-valued vectors. An ensemble of separate classifiers (dichotomizers) must be trained to learn the binary subclassifications for every instance in the training set. During classification, the bit predictions of the various dichotomizers are combined to produce a codeword prediction. The class codeword which has minimal Hamming distance to the predicted codeword determines the classification of the instance. Codewords are constructed such that their Hamming distance is maximal. Extra bits are added to allow for error recovery, allowing the correct class to be determinable even if some bits are wrong. An error-correcting output code for a k-class problem constitutes a matrix with k rows and 2 k 1 1 columns. Rows are the codewords corresponding to classes, and columns are binary subclassifications or bit functions fi such that, for an instance e, and its codeword vector C fi(e) = ~-i(c) (1) (~-i(v) the i-th coordinate of vector v). If the minimum Hamming distance between every codeword is d, then the code has an errorcorrecting capability of [ ~ J . Figure 1 shows the 5 x 15 ECOC matrix, for a 5-class problem. In this code, every codeword has a Hamming distance of at least 8 to the other codewords, so this code has an error-correcting capability of 3 bits. ECOC have two natural interpreta0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 ] 0 1 1 0 1 0 0 1 0 1 1 1 0 1 ~ 1 0 1 0 0 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 0 1 1 Figure h ECOC for a five-class problem. tions. From an information-theoretic perspective, classification with ECOC is like channel coding (Shannon, 1948): the class of a pat tern to be classified is a da tum sent over a noisy communication channel. The communication channel consists of the trained classifier. The noise consists of the bias (systematic error) and variance (training set-dependent error) of the classifier, which together make up for the overall error
منابع مشابه
Short- vs. Long-Term Effects of Reactive Incidental Focus on Form in Free Discussion EFL Classes
This study investigated the effectiveness of reactive incidental focus on form (FonF) for each learner with regard to different linguistic categories in meaning-oriented EFL classes. To this end, 30 hr of meaningful interactions of upper-intermediate EFL learners were audio-recorded in 2 free discussion classes. Instances of reactive incidental focus-on-form episodes (FFEs), where teachers offe...
متن کاملApplied Linguistic Approach to Language Learning Strategies (A Critical Review)
From applied linguistic point of view, the fundamental question facing the language teachers, methodologists and course designers is which procedure is more effective in FL/SL: learning to use or using to learn? Definitely, in order to be a competent language user, knowledge of language system is necessary, but it is not sufficient to be a successful language user. That is why there was a gradu...
متن کاملLinguistic Ethnography: Identifying Dominant Word Classes in Text
In this paper, we propose a method for ”linguistic ethnography” – a general mechanism for characterising texts with respect to the dominance of certain classes of words. Using humour as a case study, we explore the automatic learning of salient word classes, including semantic classes (e.g., person, animal), psycholinguistic classes (e.g., tentative, cause), and affective load (e.g., anger, hap...
متن کاملIncorporating Linguistic Knowledge for Learning Distributed Word Representations
Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge...
متن کاملMultimodal Semantic Learning from Child-Directed Input
Children learn the meaning of words by being exposed to perceptually rich situations (linguistic discourse, visual scenes, etc). Current computational learning models typically simulate these rich situations through impoverished symbolic approximations. In this work, we present a distributed word learning model that operates on child-directed speech paired with realistic visual scenes. The mode...
متن کامل